Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译
Multivariate time series forecasting constitutes important functionality in cyber-physical systems, whose prediction accuracy can be improved significantly by capturing temporal and multivariate correlations among multiple time series. State-of-the-art deep learning methods fail to construct models for full time series because model complexity grows exponentially with time series length. Rather, these methods construct local temporal and multivariate correlations within subsequences, but fail to capture correlations among subsequences, which significantly affect their forecasting accuracy. To capture the temporal and multivariate correlations among subsequences, we design a pattern discovery model, that constructs correlations via diverse pattern functions. While the traditional pattern discovery method uses shared and fixed pattern functions that ignore the diversity across time series. We propose a novel pattern discovery method that can automatically capture diverse and complex time series patterns. We also propose a learnable correlation matrix, that enables the model to capture distinct correlations among multiple time series. Extensive experiments show that our model achieves state-of-the-art prediction accuracy.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Causal chain reasoning (CCR) is an essential ability for many decision-making AI systems, which requires the model to build reliable causal chains by connecting causal pairs. However, CCR suffers from two main transitive problems: threshold effect and scene drift. In other words, the causal pairs to be spliced may have a conflicting threshold boundary or scenario. To address these issues, we propose a novel Reliable Causal chain reasoning framework~(ReCo), which introduces exogenous variables to represent the threshold and scene factors of each causal pair within the causal chain, and estimates the threshold and scene contradictions across exogenous variables via structural causal recurrent neural networks~(SRNN). Experiments show that ReCo outperforms a series of strong baselines on both Chinese and English CCR datasets. Moreover, by injecting reliable causal chain knowledge distilled by ReCo, BERT can achieve better performances on four downstream causal-related tasks than BERT models enhanced by other kinds of knowledge.
translated by 谷歌翻译
This paper proposes a graph-based approach to representing spatio-temporal trajectory data that allows an effective visualization and characterization of city-wide traffic dynamics. With the advance of sensor, mobile, and Internet of Things (IoT) technologies, vehicle and passenger trajectories are being increasingly collected on a massive scale and are becoming a critical source of insight into traffic pattern and traveller behaviour. To leverage such trajectory data to better understand traffic dynamics in a large-scale urban network, this study develops a trajectory-based network traffic analysis method that converts individual trajectory data into a sequence of graphs that evolve over time (known as dynamic graphs or time-evolving graphs) and analyses network-wide traffic patterns in terms of a compact and informative graph-representation of aggregated traffic flows. First, we partition the entire network into a set of cells based on the spatial distribution of data points in individual trajectories, where the cells represent spatial regions between which aggregated traffic flows can be measured. Next, dynamic flows of moving objects are represented as a time-evolving graph, where regions are graph vertices and flows between them are treated as weighted directed edges. Given a fixed set of vertices, edges can be inserted or removed at every time step depending on the presence of traffic flows between two regions at a given time window. Once a dynamic graph is built, we apply graph mining algorithms to detect change-points in time, which represent time points where the graph exhibits significant changes in its overall structure and, thus, correspond to change-points in city-wide mobility pattern throughout the day (e.g., global transition points between peak and off-peak periods).
translated by 谷歌翻译
成功的点云注册依赖于在强大的描述符上建立的准确对应关系。但是,现有的神经描述符要么利用旋转变化的主链,其性能在较大的旋转下下降,要么编码局部几何形状,而局部几何形状不太明显。为了解决这个问题,我们介绍Riga以学习由设计和全球了解的旋转不变的描述符。从稀疏局部区域的点对特征(PPF)中,旋转不变的局部几何形状被编码为几何描述符。随后,全球对3D结构和几何环境的认识都以旋转不变的方式合并。更具体地说,整个框架的3D结构首先由我们的全球PPF签名表示,从中学到了结构描述符,以帮助几何描述符感知本地区域以外的3D世界。然后将整个场景的几何上下文全局汇总到描述符中。最后,将稀疏区域的描述插值到密集的点描述符,从中提取对应关系进行注册。为了验证我们的方法,我们对对象和场景级数据进行了广泛的实验。在旋转较大的情况下,Riga就模型Net40的相对旋转误差而超过了最先进的方法8 \度,并将特征匹配的回忆提高了3DLOMATCH上的至少5个百分点。
translated by 谷歌翻译
点对特征(PPF)广泛用于6D姿势估计。在本文中,我们提出了一种基于PPF框架的有效的6D姿势估计方法。我们介绍了一个目标良好的下采样策略,该策略更多地集中在边缘区域,以有效地提取复杂的几何形状。提出了一种姿势假设验证方法来通过计算边缘匹配度来解决对称歧义。我们对两个具有挑战性的数据集和一个现实世界中收集的数据集进行评估,这证明了我们方法对姿势估计几何复杂,遮挡,对称对象的优越性。我们通过将其应用于模拟穿刺来进一步验证我们的方法。
translated by 谷歌翻译
在无监督的域适应性(UDA)中,直接从源到目标域的适应通常会遭受明显的差异,并导致对齐不足。因此,许多UDA的作品试图通过各种中间空间逐渐和轻柔地消失域间隙,这些空间被称为域桥接(DB)。但是,对于诸如域自适应语义分割(DASS)之类的密集预测任务,现有的解决方案主要依赖于粗糙的样式转移以及如何优雅地桥接域的优雅桥梁。在这项工作中,我们诉诸于数据混合以建立用于DASS的经过经过经过经过讨论的域桥接(DDB),通过该域的源和目标域的联合分布与中间空间中的每个分布进行对齐并与每个分布。 DDB的核心是双路径域桥接步骤,用于使用粗糙和精细的数据混合技术生成两个中间域,以及一个跨路径知识蒸馏步骤,用于对两个互补模型进行对生成的中间样品进行培训的互补模型作为“老师”以多教老师的蒸馏方式发展出色的“学生”。这两个优化步骤以交替的方式工作,并相互加强以具有强大的适应能力引起DDB。对具有不同设置的自适应分割任务进行的广泛实验表明,我们的DDB显着优于最先进的方法。代码可从https://github.com/xiaoachen98/ddb.git获得。
translated by 谷歌翻译
社会过程的持续数字化转化为时间序列数据的扩散,这些数据涵盖了诸如欺诈检测,入侵检测和能量管理等应用,在这种应用程序中,异常检测通常对于启用可靠性和安全性至关重要。许多最近的研究针对时间序列数据的异常检测。实际上,时间序列异常检测的特征是不同的数据,方法和评估策略,现有研究中的比较仅考虑了这种多样性的一部分,这使得很难为特定问题设置选择最佳方法。为了解决这一缺点,我们介绍了有关数据,方法和评估策略的分类法,并使用分类法提供了无监督时间序列检测的全面概述,并系统地评估和比较了最先进的传统以及深度学习技术。在使用九个公开可用数据集的实证研究中,我们将最常用的性能评估指标应用于公平实施标准下的典型方法。根据分类法提供的结构化,我们报告了经验研究,并以比较表的形式提供指南,以选择最适合特定应用程序设置的方法。最后,我们为这个动态领域提出了研究方向。
translated by 谷歌翻译
由于缺乏异常样品,因此仅具有正常样本的先验知识的异常检测才吸引更多的注意力。现有的基于CNN的像素重建方法遇到了两个问题。首先,重建源和目标是包含无法区分的语义信息的原始像素值。其次,CNN倾向于很好地重建正常样品和异常情况,使它们仍然很难区分。在本文中,我们提出异常检测变压器(ADTR)将变压器应用于重建预训练的特征。预训练的功能包含可区分的语义信息。同样,采用变压器限制以很好地重构异常,因此一旦重建失败,就可以轻松检测到异常。此外,我们提出了新的损失函数,使我们的方法与正常样本的情况以及具有图像级和像素级标记为异常的异常情况兼容。通过添加简单的合成或外部无关异常,可以进一步提高性能。广泛的实验是在包括MVTEC-AD和CIFAR-10在内的异常检测数据集上进行的。与所有基线相比,我们的方法取得了卓越的性能。
translated by 谷歌翻译